Discriminant analysis with singular covariance matrices. A method incorporating cross-validation and efficient randomized permutation tests

1996 ◽  
Vol 10 (3) ◽  
pp. 189-213 ◽  
Author(s):  
Philip Jonathan ◽  
W. V. (Mac) McCarthy ◽  
Adrian M. I. Roberts
2018 ◽  
Author(s):  
Παντελής Σταυρούλιας

Οι έγκυρες προβλέψεις χρηματοοικονομικών κρίσεων διασφάλιζαν ανέκαθεν την σταθερότητα τόσο ολόκληρου του χρηματοοικονομικού οικοδομήματος γενικότερα, όσο και του τραπεζικού τομέα ειδικότερα. Με την παρούσα διατριβή επιτυγχάνεται η πρόβλεψη συστημικών τραπεζικών κρίσεων για χώρες της EE-14 αρκετά τρίμηνα προτού αυτές γίνουν αντιληπτές με την χρησιμοποίηση των πιο διαδεδομένων μεταβλητών (μακροοικονομικών, τραπεζικών και αγοράς) μέσω δύο προσεγγίσεων, της δυαδικής και της πολυεπίπεδης. Ακολουθώντας τη δυαδική προσέγγιση, εξάγονται μοντέλα ταξινόμησης με την εφαρμογή της Διακριτής Ανάλυσης (Discriminant Analysis), της Γραμμικής Παλινδρόμησης (Linear Regression), της Λογιστικής Παλινδρόμησης (Logistic Regression) και της Παλινδρόμησης Πιθανοομάδας (Probit Regression), για την έγκαιρη πρόβλεψη των κρίσεων -12 έως -7 τρίμηνα πριν την εμφάνισή τους. Επιπροσθέτως, συγκρίνεται η απόδοση της ανωτέρω ανάλυσης χρησιμοποιώντας τις νεότερες και πλέον υποσχόμενες μεθόδους του Δέντρου Ταξινόμησης (Classification Tree), του Τυχαίου Δάσους (Random Forest) και της C5. Ταυτόχρονα προτείνεται ένα νέο μέτρο επιλογής κατωφλίων και απόδοσης προσαρμογής (GoF) των μοντέλων πρόβλεψης και μια νέα συνδυαστική (combined) μέθοδος ταξινόμησης. Προκειμένου να διερευνηθεί η απόδοση της ανωτέρω ανάλυσης, χρησιμοποιείται ο εκτός του δείγματος έλεγχος (out-of-sample testing) με τη μέθοδο της ανά χώρα σταυρωτής επικύρωσης (country-blocked cross validation). Σύμφωνα με τη μέθοδο αυτή, πραγματοποιείται η ανάλυση και εξάγονται τα μοντέλα πρόβλεψης με τη χρήση των δεκατριών από τις δεκατέσσερις χώρες του δείγματος (in-sample), εφαρμόζονται τα εξαγόμενα μοντέλα για την δέκατη τέταρτη χώρα που είχε εξαιρεθεί από το αρχικό δείγμα (out-of-sample) και ελέγχονται τα αποτελέσματα πρόβλεψης με τα πραγματικά δεδομένα της χώρας αυτής. Η παραπάνω διαδικασία επαναλαμβάνεται δεκατέσσερις φορές, αφήνοντας δηλαδή κάθε φορά μια χώρα εκτός δείγματος και τελικά εξάγεται ο μέσος όρος των επαναλήψεων. Στην παρούσα διατριβή, και χρησιμοποιώντας τον εκτός του δείγματος έλεγχο, επιτυγχάνεται η κατά 82.4% σωστή ταξινόμηση (Ακρίβεια – Accuracy), 78.4% ποσοστό Αληθινών Θετικών (Τrue Ρositive Rate - TPR) και 80.6% ποσοστό Θετικής Τιμής Πρόβλεψης (Positive Predictive Value - PPV). Σύμφωνα με την πολυεπίπεδη προσέγγιση, διακρίνονται δύο επίπεδα-περίοδοι πρόβλεψης των Συστημικών Τραπεζικών Κρίσεων. Το πρώτο επίπεδο ονομάζεται έγκαιρη πρόβλεψη (early warning) και αφορά περίοδο -12 έως -7 τρίμηνα πριν την έλευση της κρίσης ενώ το δεύτερο επίπεδο ονομάζεται καθυστερημένη πρόβλεψη (late warning) και αφορά περίοδο -6 έως -1 τρίμηνα πριν την έλευση της κρίσης. Για την πολυεπίπεδη αυτή ταξινόμηση, γίνεται χρήση των Νευρωνικών Δικτύων (Neural Networks), της Πολυωνυμικής Λογιστικής Παλινδρόμησης (Multinomial Logistic Regression) και της Πολυεπίπεδης Γραμμικής Διακριτής Ανάλυσης (Multinomial Discriminant Analysis). Εφαρμόζοντας τον ίδιο εκτός του δείγματος έλεγχο με την πρώτη προσέγγιση επιτυγχάνεται η κατά 85.7% σωστή ταξινόμηση με την βέλτιστη μέθοδο που αποδεικνύεται ότι είναι η Πολυεπίπεδη Γραμμική Διακριτή Ανάλυση. Εφαρμόζοντας την ανωτέρω ανάλυση, οι ενδιαφερόμενοι φορείς άσκησης πολιτικής (policy makers) μπορούν να ανιχνεύσουν την ύπαρξης κρίσης σε βάθος χρόνου έως τριών ετών με τα προτεινόμενα μοντέλα, χρησιμοποιώντας μόνο δεδομένα που υπάρχουν ελεύθερα προσβάσιμα στο κοινό, ασκώντας με τον τρόπο αυτό την κατάλληλη ανά περίπτωση μακροπροληπτική πολιτική (macroprudential policy).


2020 ◽  
Vol 2 (2) ◽  
pp. 29-38
Author(s):  
Abdur Rohman Harits Martawireja ◽  
Hilman Mujahid Purnama ◽  
Atika Nur Rahmawati

Pengenalan wajah manusia (face recognition) merupakan salah satu bidang penelitian yang penting dan belakangan ini banyak aplikasi yang menerapkannya, baik di bidang komersil ataupun di bidang penegakan hukum. Pengenalan wajah merupakan sebuah sistem yang berfungsikan untuk mengidentifikasi berdasarkan ciri-ciri dari wajah seseorang berbasis biometrik yang memiliki keakuratan tinggi. Pengenalan wajah dapat diterapkan pada sistem keamanan. Banyak metode yang dapat digunakan dalam aplikasi pengenalan wajah untuk keamanan sistem, namun pada artikel ini akan membahas tentang dua metode yaitu Two Dimensial Principal Component Analysis dan Kernel Fisher Discriminant Analysis dengan metode klasifikasi menggunakan K-Nearest Neigbor. Kedua metode ini diuji menggunakan metode cross validation. Hasil dari penelitian terdahulu terbukti bahwa sistem pengenalan wajah metode Two Dimensial Principal Component Analysis dengan 5-folds cross validation menghasilkan akurasi sebesar 88,73%, sedangkan dengan 2-folds validation akurasi yang dihasilkan sebesar 89,25%. Dan pengujian metode Kernel Fisher Discriminant dengan 2-folds cross validation menghasilkan akurasi rata rata sebesar 83,10%.


1986 ◽  
Vol 14 (1-2) ◽  
pp. 159-176 ◽  
Author(s):  
Robert A. Nicholson ◽  
Joseph M. Horn

Eleven background, diagnostic, and hospitalization characteristics were used to discriminate committed and voluntary psychiatric patients in a double cross-validation design. Diagnosis was more important than individual social and status resources (race, marital status, education, and employment status) in discriminating the two groups of patients. Further, characteristics of hospitalization (length of stay, percentage of patients receiving maximum benefit from treatment, and frequency of discharge referrals) did not contribute significantly to discrimination of the two groups, suggesting that committed and voluntary patients did not differ with regard to the adequacy or effectiveness of treatment in the hospital.


2018 ◽  
Vol 148 ◽  
pp. 223-233 ◽  
Author(s):  
Jun Tong ◽  
Rui Hu ◽  
Jiangtao Xi ◽  
Zhitao Xiao ◽  
Qinghua Guo ◽  
...  

2017 ◽  
Vol 2017 ◽  
pp. 1-10 ◽  
Author(s):  
Fengnong Chen ◽  
Pulan Chen ◽  
Hamed Hamid Muhammed ◽  
Juan Zhang

The aim of the paper is to identify the breast malignant and benign lesions using the features of apparent diffusion coefficient (ADC), perfusion fraction f, pseudodiffusion coefficient D⁎, and true diffusion coefficient D from intravoxel incoherent motion (IVIM). There are 69 malignant cases (including 9 early malignant cases) and 35 benign breast cases who underwent diffusion-weighted MRI at 3.0 T with 8 b-values (0~1000 s/mm2). ADC and IVIM parameters were determined in lesions. The early malignant cases are used as advanced malignant and benign tumors, respectively, so as to assess the effectiveness on the result. A predictive model was constructed using Support Vector Machine Binary Classification (SVMBC, also known Support Vector Machine Discriminant Analysis (SVMDA)) and Partial Least Squares Discriminant Analysis (PLSDA) and compared the difference between them both. The D value and ADC provide accurate identification of malignant lesions with b=300, if early malignant tumor was considered as advanced malignant (cancer). The classification accuracy is 93.5% for cross-validation using SVMBC with ADC and tissue diffusivity only. The sensitivity and specificity are 100% and 87.0%, respectively, r2cv=0.8163, and root mean square error of cross-validation (RMSECV) is 0.043. ADC and IVIM provide quantitative measurement of tissue diffusivity for cellularity and are helpful with the method of SVMBC, getting comprehensive and complementary information for differentiation between benign and malignant breast lesions.


2012 ◽  
Vol 2012 ◽  
pp. 1-15 ◽  
Author(s):  
Hervé Abdi ◽  
Lynne J. Williams ◽  
Andrew C. Connolly ◽  
M. Ida Gobbini ◽  
Joseph P. Dunlop ◽  
...  

We present a new discriminant analysis (DA) method called Multiple Subject Barycentric Discriminant Analysis (MUSUBADA) suited for analyzing fMRI data because it handles datasets with multiple participants that each provides different number of variables (i.e., voxels) that are themselves grouped into regions of interest (ROIs). Like DA, MUSUBADA (1) assigns observations to predefined categories, (2) gives factorial maps displaying observations and categories, and (3) optimally assigns observations to categories. MUSUBADA handles cases with more variables than observations and can project portions of the data table (e.g., subtables, which can represent participants or ROIs) on the factorial maps. Therefore MUSUBADA can analyze datasets with different voxel numbers per participant and, so does not require spatial normalization. MUSUBADA statistical inferences are implemented with cross-validation techniques (e.g., jackknife and bootstrap), its performance is evaluated with confusion matrices (for fixed and random models) and represented with prediction, tolerance, and confidence intervals. We present an example where we predict the image categories (houses, shoes, chairs, and human, monkey, dog, faces,) of images watched by participants whose brains were scanned. This example corresponds to a DA question in which the data table is made of subtables (one per subject) and with more variables than observations.


2019 ◽  
Vol 23 (2) ◽  
pp. 130-141 ◽  
Author(s):  
Chun-Chang Lee ◽  
Chih-Min Liang ◽  
Yang-Tung Liu

This paper compares the predictive powers of hierarchical generalized linear modeling (HGLM), logistic regression, and discriminant analysis with regard to tenure choices between buying property and renting property by sampling the residents of the Greater Taipei area. The results imply that the hit rate and other indicators included in HGLM have better predictive power with regard to tenure choices than the binary logistic regression model and the discriminant analysis model. That is, using HGLM to process nested data can increase prediction accuracy regarding household tenure choices. Furthermore, cross-validation is performed to analyze hit rate stability. The hit rate sequencing from this cross-validation is found to be consistent with the HGLM results, implying that the comparison of the three models in terms of hit rate performance prediction in this study is stable and reliable.


PeerJ ◽  
2021 ◽  
Vol 9 ◽  
pp. e11928
Author(s):  
Shanjia Li ◽  
Hui Wang ◽  
Ling Jin ◽  
James F. White ◽  
Kathryn L. Kingsley ◽  
...  

Background Place of origin is an important factor when determining the quality and authenticity of Angelica sinensis for medicinal use. It is important to trace the origin and confirm the regional characteristics of medicinal products for sustainable industrial development. Effectively tracing and confirming the material’s origin may be accomplished by detecting stable isotopes and mineral elements. Methods We studied 25 A. sinensis samples collected from three main producing areas (Linxia, Gannan, and Dingxi) in southeastern Gansu Province, China, to better identify its origin. We used inductively coupled plasma mass spectrometry (ICP-MS) and stable isotope ratio mass spectrometry (IRMS) to determine eight mineral elements (K, Mg, Ca, Zn, Cu, Mn, Cr, Al) and three stable isotopes (δ13C, δ15N, δ18O). Principal component analysis (PCA), partial least square discriminant analysis (PLS-DA) and linear discriminant analysis (LDA) were used to verify the validity of its geographical origin. Results K, Ca/Al, δ13C, δ15N and δ18O are important elements to distinguish A. sinensis sampled from Linxia, Gannan and Dingxi. We used an unsupervised PCA model to determine the dimensionality reduction of mineral elements and stable isotopes, which could distinguish the A. sinensis from Linxia. However, it could not easily distinguish A. sinensis sampled from Gannan and Dingxi. The supervised PLS-DA and LDA models could effectively distinguish samples taken from all three regions and perform cross-validation. The cross-validation accuracy of PLS-DA using mineral elements and stable isotopes was 84%, which was higher than LDA using mineral elements and stable isotopes. Conclusions The PLS-DA and LDA models provide a theoretical basis for tracing the origin of A. sinensis in three regions (Linxia, Gannan and Dingxi). This is significant for protecting consumers’ health, rights and interests.


Author(s):  
Claus Steppert ◽  
Isabel Steppert ◽  
Gunther Becher ◽  
William Sterlacci ◽  
Thomas Bollinger

AbstractThere is an urgent need for screening patients of having a communicable viral disease to cut infection chains.We could recently demonstrate that MCC-IMS of breath is able to identify Influenza-A infected patients. With decreasing Influenza epidemic and upcoming SARS-CoV-2 infections we extended our study to the analysis of patients with suspected SARS-CoV-2 infections.51 patients, 23m, 28f, aged 64 ± 16 years, were included in this study.Besides RT-PCR analysis of nasopharyngeal swabs all patients underwent MCC-IMS analysis of breath. 16 patients, 7m, 9f, were positive for SARS-CoV-2 by RT-PCR. There was no difference in gender or age according to the groups.Stepwise canonical discriminant analysis could correctly classify the infected and non-infected subjects in 98% by cross-validation. Afterwards we combined the Influenza-A sub study and the SARS-CoV-2-sub study for a total of 75 patients, 34m, 41f, aged 64.8 ± 1.8 years, 14 positive for Influenza-A, 16 positive for SARS-CoV-2, the remaining 44 patients were used as controls. In one patient RT-PCR was highly suspicious of SARS-CoV-2 but inconclusive.There was no imbalance between the groups for age or gender.97.3% of the patients could be correctly classified to the respective group by discriminant analysis. Even the inconclusive patient could be mapped to the SARS-CoV-2 group applying the discrimination function.ConclusionMCC-IMS is able to detect SARS-CoV-2 infection and Influenza-A infection in breath. As this method provides exact, fast non-invasive diagnosis it should be further developed for screening of communicable viral diseases.Study registration: NCT04282135


Sign in / Sign up

Export Citation Format

Share Document